Accurate recognition of cis-regulatory motifs with the correct lengths in prokaryotic genomes
نویسندگان
چکیده
We present a new computational method for solving a classical problem, the identification problem of cis-regulatory motifs in a given set of promoter sequences, based on one key new idea. Instead of scoring candidate motifs individually like in all the existing motif-finding programs, our method scores groups of candidate motifs with similar sequences, called motif closures, using a P-value, which has substantially improved the prediction reliability over the existing methods. Our new P-value scoring scheme is sequence length independent, hence allowing direct comparisons among predicted motifs with different lengths on the same footing. We have implemented this method as a Motif Recognition Computer (MREC) program, and have extensively tested MREC on both simulated and biological data from prokaryotic genomes. Our test results indicate that MREC can accurately pick out the actual motif with the correct length as the best scoring candidate for the vast majority of the cases in our test set. We compared our prediction results with two motif-finding programs Cosmo and MEME, and found that MREC outperforms both programs across all the test cases by a large margin. The MREC program is available at http://csbl.bmb.uga.edu/~bingqiang/MREC1/.
منابع مشابه
An integrative and applicable phylogenetic footprinting framework for <i>cis</i>-regulatory motifs identification in prokaryotic genomes
Background: Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif p...
متن کاملGenome-wide de novo prediction of cis-regulatory binding sites in prokaryotes
Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindered our understanding of many important biological processes. In this article, w...
متن کاملGeneMarkS-2: Raising Standards of Accuracy in Gene Recognition
Motivation: Ab initio gene prediction in prokaryotic genomes is supposed to be so accurate that RNASeq data are rarely produced to bring in an additional layer of evidence. In 2016 more than 60,000 prokaryotic genomes were re-annotated by the NCBI pipeline. Given the sheer volume of prokaryotic DNA data flowing from next generation sequencing facilities into genome databases, the annotation acc...
متن کاملDiscovering cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines
An increasing number of cis-regulatory RNA elements have been found to regulate gene expression post-transcriptionally in various biological processes in bacterial systems. Effective computational tools for large-scale identification of novel regulatory RNAs are strongly desired to facilitate our exploration of gene regulation mechanisms and regulatory networks. We present a new computational p...
متن کاملMicroFootPrinter: a tool for phylogenetic footprinting in prokaryotic genomes
Phylogenetic footprinting is a method for the discovery of regulatory elements in a set of homologous regulatory regions, usually collected from multiple species. It does so by identifying the most conserved motifs in those homologous regions. This note describes web software that has been designed specifically for this purpose in prokaryotic genomes, making use of the phylogenetic relationship...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 38 شماره
صفحات -
تاریخ انتشار 2010